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Abstract. Resource allocation for cloud services is a complex task due 
to the diversity of the services and the dynamic workloads. One way to 
address this is by overprovisioning which results in high cost due to the 
unutilized resources. A much more economical approach, relying on the 
stochastic nature of the demand, is to allocate just the right amount 
of resources and use additional more expensive mechanisms in case of 
overflow situations where demand exceeds the capacity. In this paper 
we study this approach and show both by comprehensive analysis for 
independent normal distributed demands and simulation on synthetic 
data that it is significantly better than currently deployed methods. 


1 Introduction 

The recent rapid development of cloud technology gives rise to “many-and- 
diverse” services being deployed in datacenters across the world. The allocation 
of the available resources in the various locations to these services has a critical 
impact on the ability to provide a ubiquitous cost-effective high quality service. 
There are many challenges associated with optimal service placement due to 
the large scale of the problem, the need to obtain state information, and the 
geographical spreading of the datacenters and users. 

One intriguing problem is the fact that the service resource requirement 
changes over time and is not fully known at the time of placement. A popular 
way of addressing this important problem is over-provisioning, that is allocating 
resources for the peak demand. Clearly, this is not a cost effective approach as 
much of the resources are unused most of the time. An alternative approach is 
to model the service requirement as a stochastic process with known parameters 
(these parameters can be inferred from historical data). 

Much of the previous work that used stochastic demand modeling mm 
attempted to minimize the probability of an overflow event and not the cost of 
this overflow. Their solution was based on solving the Stochastic Bin Packing 
(SBP) problem, where the goal is to pack a given set of random items to a 
minimum number of bins such that each bin overflow probability will not exceed 
a given valueQ 

Thus, in these works there is no distinction between cases with marginal and 
substantial overflow of the demand, and under this modeling one again prepares 

1 |3I4] also looked at the online SBP problem, where the items (services in our case) 
are assigned to some bin (datacenter) as they arrive. 





for the worst and tries to prevent an overflow. However, in reality it is not cost 
effective to allocate resources according to the worst case scenario. Instead, one 
often wishes to minimize the cost associated with periods of insufficient resource 
availability, which is commonly dealt with by either diverting the service request 
to a remote location or dynamically buying additional resources. In both cases, 
the cost associated with such events is proportional to the amount of unavailable 
resources. Thus, our aim is to minimize the expected overflow of the demand over 
time. 

We deviate from previous work in two ways. First, We look at a stochastic 
packing problem where the number of bins is given, e.g., a company already 
has two datacenters where services can be placed, and one would like to place 
services in these two datacenters optimally. Second, as we mentioned before, we 
look at a more practical optimality criterion where we do not wish to optimize 
the probability of an overflow, but instead its expected deviation. 

To better understand this, assume that the resource we are optimizing is the 
bandwidth consumption of the service in a datacenter where we have a prebooked 
bandwidth for each datacenter. Since we are dealing with stochastic bandwidth 
demand, with some probability the traffic will exceed the prebooked bandwidth 
and will result with expensive overpayment that depends on the amount of 
oversubscribed traffic. Obviously, in such a case minimizing the probability of 
oversubscription may not give the optimal cost, and what we need to minimize is 
the expected deviation from the prebooked bandwidth, thus we term our problem 
SP-MED (stochastic packing with minimum expected deviation). 

We analyze the case of independent normal distribution and develop algo¬ 
rithms for the optimal partition between two or more datacenters that need not 
be identical. We prove the correctness of our algorithm by separating the prob¬ 
lem into two: one dealing with the continuous characteristics of the stochastic 
objective function and the other dealing with the discrete nature of the combi¬ 
natorial algorithm. This approach results in a clean and elegant algorithm and 
proof. 

In fact, our proof technique reveals that the algorithms we developed hold for 
a large family of optimization criteria, and thus may be applicable for other opti¬ 
mization problems. In particular we study two other natural cost functions that 
correspond to minimizing overflow probability (rather than expected overflow 
deviation). We show that these cost functions also fall into our general frame¬ 
work, and therefore the same algorithm works for them as well! As will become 
clear later the requirements from the cost functions we have are quite natural, 
and we believe our methods will turn useful for many other applications. 


2 Related work 

Early work on VM placement (e.g., |5E0) models the problem as a deter¬ 
ministic bin packing problem, namely, for every service there is an estimate of 
its deterministic demand. Stochastic bin packing was first suggested by Klein- 
berg, Rabani and Tardos pj for statistical multiplexing. [T mostly considered 



Bernoulli-type distributions. Goel and Indyk [2] further studied Poisson and ex¬ 
ponential distributions. Wang, Meng and Zhang [3] suggested to model real data 
with the normal distribution. Thus, the input to their stochastic packing problem 
is n independent services each with demand requirement distributed according 
to a distribution that is normal with mean /ff 1 ' 1 and variance V^. The 
output is some partition of the services to bins in a way that minimizes a target 
function that differs from problem to problem. 

A naive approach to such a problem is to reduce it to classical bin packing as 
follows: for the i’th service define the effective size as the number eW such that 
the probability that is larger than eW is small; then solve the classical bin 
packing problem (or a variant of it) with item sizes e *- 1 ),... ,e^ n \ However, [J 
showed this approach can be quite wasteful, mostly because it adds extra space 
per service and not per bin. To demonstrate the issue, think about unbiased, in¬ 
dependent coin tosses. The probability one coin toss significantly deviates from 
its mean is 1, while the probability 100 independent coin tosses significantly 
deviate from the mean is exponentially small. When running independent trials 
there is a smoothing effect that considerably reduces the chance of high devi¬ 
ations. This can also be seen from the fact that the standard deviation of n 
independent, identical processes is only ffn times the standard deviation of one 
process, and so the standard deviation grows much slower than the number of 
processes. 

Breitgand and Epstein [4], building on f3j, suggest an algorithm for stochas¬ 
tic bin packing that takes advantage of this smoothing effect. The algorithm 
assumes all bins have equal capacity. The algorithm first sorts the processes by 
their variance to mean ratio (VMR), i.e., < • • ■ < Then the 

algorithm finds the largest prefix of the sorted list such that allocating that set 
of services to the first bin makes the probability the first bin overflows at most 
p. The algorithm then proceeds bin by bin, each time allocating a prefix of the 
remaining services on the sorted list to the next bin. [3] show that if we allow 
fractional solutions, i.e., we allow splitting services between bins, the algorithm 
finds an optimal solution, and also show an online, integral version that gives a 
2-approximation to the optimum. 

3 The risk unbalancing principle: A bird’s overview of 
our technique 

In this bird’s overview we focus on the SP-MED problem, and in the next section 
we generalize it to a much wider class of cost functions. As mentioned above 
we are given k bins with capacities ci,...,Cfc. We are looking for a solution 
that minimizes the expected deviation. As before, we study the case where the 
stochastic demands are independent and normally distributed. We develop a 
new general framework to analyze this problem that also sheds light on previous 
work. 

We first observe that an optimal fractional solution to SP-MED is also op¬ 
timal for any two bins. This is true because the cost function is the sum of the 


expected deviation of all the bins, and changing the internal allocation of two 
bins only affects them and not the other bins. Thus, one possible approach to 
achieve an optimal solution, is by repeatedly improving the internal division of 
two bins. However, offhand, there is no reason to believe such a sequence of local 
improvements efficiently converges to the optimal solution. Surprisingly, this is 
indeed the case. Therefore, we first focus on the two bin case. Later (in Appendix 
0 we will see that solving the k = 2 case implies a solution to the general case 
of arbitrary k. 

We recall that if we have n independent normally distributed services with 
mean and variance (jjM\ V^), and we allocate the services with indices in I C [n] 
to the first bin, and the rest to the second bin, then the first bin is normally 
distributed with mean /.q = Yliei and variance V\ = while the 

second one is normally distributed with mean /r — /q and variance V — V± where 
fj, = and V = 1 ^- We wish to minimize the total expected 


deviation of the two bins. 

Let us define the function Dev : [0,1] x [0,1] —» R such that Dev(a , b ) is the 
total expected deviation when bin one is distributed with mean a/r and variance 
bV and bin two with mean (1 — a)/x and variance (1 — b)V. Figure [l] (left) depicts 
Dev(a , b) for two bins with equal capacity. Dev has a reflection symmetry around 
the a = b line, that corresponds to the fact that we can change the order of the 
bins as long as they have equal capacity and remain with a solution of the 
same cost (see Appendix G.2 for a proof of a similar symmetry for the case of 
different capacity bins). The points (0, 0) and (1,1) correspond to allocating all 
the services to a single bin, and indeed the expected deviation there is maximal. 

The point (|, |) is a saddle point (see Appendix |G.3| for a proof). Figure flj 
(middle) shows a zoom in around this saddle point. There are big mountains in 
the lower left and upper right quarters that fall down to the saddle point, and 
also valleys going down from the saddle point to the bottom right and top left. 
Going back to Figure[l] (left) we see that for every fixed b the function is convex 
in a, with a single minimum point, and all these minimum points form the two 
valleys mentioned above. 

We now go back and consider the input {(//*), V^)} . We can represent 

service i with the pair (a*- 1 -* = b W = ^y~). If we split the input to the sets I 

and [n] \ I, the expected deviation of this partition is Dev{^2 i 
Thus, the n input points induce 2 n possible solutions Pi = {J2i^i 
for each I C [n], and our task is to find the partition / that minimizes Dev. 
Sorting the services by their VMR, is equivalent to sorting the vectors PM = 
(aW, £>W) by their angle with the a axis. If the sorted list is pW,..., P I"! with 
increasing angles, looking for a solution in a partition that breaks the sorted 
sequence to two consecutive parts finds a solution among (0,0), P^\ PW + 
P^ 2 \ ..., Pi 1 ) + ... + p(”) = (1,1). If we connect a line between two consecutive 
points in the above sequence we get the sorted path , connecting (0, 0) with (1,1). 

A crucial observation is that all the 2 71 possible solutions P/ lie on or above 
that sorted path. See Lemma [l] for a rigorous statement and proof and Figure 
[l] (right) for an example. The fact that the sorted path always lies beneath all 





Fig. 1 . The left figure depicts Dev(a,b) when p. = 160, V = 6400 and ci = C 2 = 100. 
The middle figure is a zoom in around the saddle point (4, §). The white lines represent 
the zero level sets of the partial derivatives. The right figure depicts Dev(a, b) with 
the 2 10 possible partitions represented in orange. The bottom sorted path (that sorts 
services by their VMR in increasing order) and the upper sorted path (that sorts 
services by their VMR in decreasing order), together with their integral points, are in 
black. Notice that all partition points are confined by both sorted paths. 


possible solutions is independent of the target function, and only depends on the 
fact that we deal with the normal distribution. Actually, this is true for any 
distribution where allocating a subset of services to a bin amounts to adding the 
corresponding means and variances. 

A second crucial observation, proved in Theorem [T] is that the optimal frac¬ 
tional solution lies on the sorted path. In fact, the proof shows this is a general 
phenomenon that holds for many possible target functions. What we need from 
the cost function Dev we are trying to minimize is: 

— Dev has a symmetry of reflection around a line that passes through a saddle 
point ( a ', |). 

— For every fixed b, Dev is strictly uni-modal in a (i.e., has a unique minimum 
in a). We call the set of solutions {( m(b),b )} the valley. The saddle point 
lies on the valley. 

— Dev restricted to the valley is strictly monotone for b < \ and b > | with a 
maximum at the saddle point that is obtained when b = |. 

To see why these conditions suffice, consider an arbitrary fractional solution 
(a, b). By the symmetry property we may assume without loss of generality that 
b < |. If (a, b) is left to the valley, we can improve the solution by keeping b and 
moving a towards the valley, until we either hit the sorted path or the valley. If 
we hit the valley first, we can improve the expected deviation, by going down 
the valley until we hit the sorted path. A similar argument applies when (a, b ) is 
right to the valley. The conclusion is that we can always find another fractional 
solution that is better than (a, b) and lies on the sorted path. 

Thus, quite surprisingly, we manage to decouple the question to two separate 
and almost orthogonal questions. The first question is what is the behavior of 
the function Dev over its entire domain. Notice that Dev is a function of only 









two variables, independent of n. This question does not depend on the input 
{(/iW, V"M)} at all. We study Dev using analytic tools. The second question is 
what is the geometric structure of the set of feasible points, and here we inves¬ 
tigate the input, completely ignoring the target function Dev. For this question 
we use geometric intuition and combinatorial tools. 

In Appendix [B] we prove our three cost functions fall into the above frame¬ 
work. The discussion above also gives an intuitive geometric interpretation of 
the results in [4l3j . We believe the framework is also applicable to many other 
target functions. 

There is an intuitive explanation why the optimal fractional solution lies on 
the sorted path. Sorting services by their VMR essentially sorts them by their 
risk, where risk is the amount of variance per one unit of expectation. Partition¬ 
ing the sorted list corresponds to putting all the low risk services in one bin, and 
all the high risk services in the other. We call this the risk unbalancing principle. 
Intuitively, we would like to give the high risk services as much spare capacity 
as possible, and we achieve that by grouping all low risk services together and 
giving them less spare capacity. In contrast, balancing risk amounts to taking 
the point (|, |) (for the case where the two bins have equal capacities). Having 
the geometric picture in mind, we immediately see that this saddle point is not 
optimal, and can be improved by taking the point on the valley that intersects 
the sorted path. 

The technique also applies to bins having different capacities that previous 
work did not analyze, and reveals that we should allocate low risk services to 
the bin with the lower capacity. This follows from a neat geometric argument 
that when ci < C 2 the optimal fractional solution lies on the sorted path (that 
sorts services by their VMR in increasing order) and not the upper sorted path 
(that sorts services by their VMR in decreasing order). See Section 5.2 and 
Theorem [l] for more details. If we have k bins, we should sort the bins by their 
capacity and the services by their VMR, and then allocate consecutive segments 
of the sorted list of services to the bins, with lower risk services allocated to 
smaller capacity bins. This double-sorting again intuitively follows from the risk 
unbalancing principle, trying to preserve as much spare capacity to the high risk 
services^] We prove the double-sorting algorithm in Appendix |a| 

Finally, we are left with the question of finding the right partition of the 
sorted list. With two bins one can simply try all n partition points. However, 
with k bins there are (.".J = <9(n fc_1 ) possible partition points. We show a 
dynamic programming algorithm that finds the best partition in poly(n ) time. 
The optimal solution can probably be found much faster and we have candidate 


2 In a similar vein if we have total capacity c to split between two bins, it is always 
better to make the bin capacities unbalanced as much as possible, i.e., the minimum 
expected deviation decreases as C 2 — ci increases. In particular the best choice is 
having a single bin with capacity c and the worst choice is splitting the capacities 
evenly between the two bins. Obviously, in practice there might be other reasons to 
have several bins, but if there is tolerance in each bin capacity it is always better to 
minimize the number of bins. We give a precise statement and proof in Appendixljj 




algorithms that work well in practice and we are currently working on formally 
proving their correctness. 

4 Formal treatment 

The input to the problem consists of k and n, specifying the number of bins and 
services, integers {ci}\ =i> specifying the bin capacities, and values {(//4, V^)} n =1 , 
where the demand distribution X M of service i is normal with mean /xM an d 
variance are independent. The output is a partition of [n] to k disjoint 

sets Si ,..., Sk C [n], where Sj includes indices of services that needs to be allo¬ 
cated to bin j. Our goal is to find a partition minimizing a given cost function 
D(S±, ... ,/Sfc). The optimal (integral) partition is the partition with minimum 
cost among all possible partitions. 

We let c denote the total capacity, /x the total demand and V the total 
variance, i.e., c = C T M = YHi =1 and F = ]T” =1 FW. The value c — /x 

represents the total spare capacity we have. The (total) standard deviation is 
a = \fV which represents the standard deviation of the input had all services 
been put into a single bin. We let A denote the spare capacity in units of the 
standard deviation, i.e., A = c 14 . We assume that c > p. 

We remind the reader that the sum of independent normal distributions 
with mean /xW and variance V^ is normal with mean J) an( j variance 
Consider a partition Si,...,Sk- Then, the demand distribution of bin 
j is normal with mean /xy = SieSj and variance Vj = YlieSj FW. The 

standard deviation of bin j is ay = and its spare capacity in units of its 

standard deviation is Aj = c ’ a IJ ' J ■ Following previous work, we assume that for 

every i, fi^ > 0 and is small enough compared with (/x^) 2 so that the 
probability of getting negative demand in service i is negligible. 

Next we normalize everything with regard to the total mean /x and the total 
variance V. The input to the function D are two vectors a±, ..., Ofc_i, b±, ..., bk-i 
s.t., Y^ k j =i a i — 1 an( l Xy=i < 1. We think of the vectors a ±,..., ak-i and 
b i,..., bk-i as representing the fraction of mean and variance each bin takes. We 
let afe = 1 — o,j and bk = 1 — Xy=i bj be the fraction of the last bin (that 

takes all the remaining mean and variance). We let fij = cijH and Vj = bjV for 
j = 1,..., k and we define oy = yJVj = \fbjcr (where a = W) and Aj = Cj ~ Atj . 
In this notation the cost function is D{a\ 1 ..., Ofc_i; b\, , bk- 1 ). 

The function D is dehned over all tuples a \,..., ak-i, b ±,..., bk -1 £ [0,1] s.t., 
Sj=i a i — 1 an< i i bj < 1. There are k n possible partitions that correspond 
to the k n (possibly) different tuple values. We call a tuple integral if it is induced 
by some partition. Our goal is to approximate the integral solution minimizing 
D over all integral inputs. 

4.1 What do we require form a cost function? 

We can handle a wide variety of cost functions. Specifically, we require the fol¬ 
lowing from the cost function: 



1. In any solution to a fc-bin problem, the allocation for any two bins is also 
optimal. Formally, if Si ,..., S/. is optimal for a fc-bin problem, then for any 
j and j' , the partition Sj, Sj> is optimal for the two-bin problem defined by 
the services in Sj U Sy and capacities Cj , cy . We remark that this is a natural 
condition that is true for almost any cost function we know. 

For k = 2 we require that: 

2. D(a, b) = D{ 1 — a — C2 ~ Cl , 1 — b). When Ci = C 2 this simply translates to 
D(a, b) = D( 1 — a, 1 — b) and there is no difference between allocating the 
set Si to the first bin or to the second one. 

3. For every fixed b £ [0,1], D(a,b) has a unique minimum on a £ [0,1], at 
some point a = m(b ), i.e. D is decreasing at a < m(b) and increasing at 
a > m(b). We call the points on the curve {(m(b),b)} the valley. 

4. D has a unique maximum over the valley at the point |). In fact by 

the symmetry above this point is (| — C2 ^ Cl , |). This means that D{m(b),b) 
is increasing for b < \ and decreasing for b > 1. 

4.2 Three cost functions 

We have already seen SP-MED, which minimizes the expected deviation. Previ¬ 
ous work on SBP (stochastic bin packing) studied the worst overflow probability 
of a bin. We define SP-MWOP to be the problem that minimizes the worst 
overflow probability of a bin. Another natural problem is what we call SP-MOP 
that minimizes the probability that some bin overflows. The three cost functions 
are related but behave differently. For example, SP-MWOP is not differentiable 
but the other two are. It is relatively simple to find the valley in SP-MED and 
SP-MWOP but we are not aware of any explicit description of the valley in 
SP-MOP. 

Nevertheless, it is remarkable that all three cost functions fall into our frame¬ 
work and we prove that in Appendix [B] The proof is not always simple, and we 
have used Lagrange multipliers and the log concavity of the cumulative distribu¬ 
tion function of the normal distribution for proving this for SP-MOP. It follows 
that all the machinery we develop in the paper holds for these cost functions. 

5 The two bin case 

We first consider the two bin case (fc = 2). We shall later see (in Appendix |A| 
that solving the k = 2 case is key for solving the general problem for any k > 2. 
As explained before we decouple the question to one about the function D and 
another about the structure of the integral points. 

5.1 The sorted path 

We now consider the input P^),..., (/^"\ V^). We represent service 

i with the pair (aW = = ill!). If W e split the input to the sets I 




and [n\ \ I, then the first bin is normally distributed with mean nYliei 
and variance V b^\ Thus, the n input points induce 2 n possible solutions 
Pi = (JV gJ cl W, Yhi^i b f° r eac h I — \ n \ an d we call eac h suc ^ P 0 ^ an 
integral point Sorting the services by their VMR, is equivalent to sorting the 
vectors P^ = (cS 1 ', &W) by the angle they make with the a axis. 

Definition 1. (The sorted path) Sort the services by their VMR in increasing 
order and calculate the PP\ Pi 2 ),..., p(") vectors. For i = 1,..., n define 


Plluom = P {1) + P (2) + ■ ■ • + P [i) ™d, 

p[i\ _ p(n) _|_ p(n- 1) _ p(n-i+ 1), 

and also define P fc [ °\ tom = Pip 1 = (0,0). 

The bottom sorted path is the curve that is formed by connecting Pj 0 J tt0m 
and Plrttom a line, for i = 0,..., n — 1. The upper sorted path is the curve 
that is formed by connecting Pup and Pup ^ with a line, for i = 0,..., n — 1. We 
sometimes abbreviate the bottom sorted path and call it the sorted path. 


[il 

The integral point P &ottom on the bottom sorted path corresponds to allocat¬ 
ing the i services with the lowest VMR to the first bin and the rest to the second. 
On the other hand, the integral point Py}, on the upper sorted path corresponds 
to allocating the i services with the highest VMR to the first bin and the rest 
to the second. A crucial, yet simple, observation (proven in Appendix |P|) : 

Lemma 1. All the integral points lie within the polygon confined by the bottom 
sorted path and the upper sorted path. 


Lemma [T] is a key lemma of the paper. Unfortunately, we had to omit the 
proof because of lack of space. We recommend the reader to look at the simple 
proof that appears in Appendix [P| In fact, 

Lemma 2. The set of fractional points coincides with the polygon confined by 
the bottom sorted path and the upper sorted path. 


5.2 The sorting algorithm 

We now present the algorithm for the k = 2 case. The algorithm for the general 
case of arbitrary k is presented in Appendix [A] The algorithm gets as input 
n, {(M (i) , , Ci, C 2 and outputs a partition Si, S 2 of [n] minimizing cost. 

We keep all notation as before. The algorithm works as follows: 

The sorting algorithm: 2 bins 

— Sort the bins by their capacity such that Ci < C 2 . 

— Sort the services by their VMR such that < ■ ■ ■ < . 

— Calculate the points = (0,0), PW, ... ,P^ = (1, 1) on the sorted path. 

— Calculate Z?(pM) for each 0 < i < n and find the index i* such that the 
point Pi* 1 achieves the minimal cost among all points PM. 

— Let Si = {1,..., i*} and S 2 = [n\ \ Si. Output (Si, S 2 ). 




The optimal fractional point is the fractional point that minimizes cost. In 
Section [3] we said, and soon will prove, that the optimal solution allocates low 
risk services to one bin and the rest to the other. However, offhand, it is not 
clear whether to allocate the smaller risk services to the lower capacity bin or 
the higher capacity bin. In fact, offhand, it is not clear whether the optimal 
solution is on the lower sorted path or the upper sorted path, and it might even 
depend on the input. Remarkably, we prove that it always lies on the bottom 
sorted path, meaning that it is always better to allocate low risk services to the 
smaller capacity bin and high risk services to the higher capacity bin. We gave 
an intuitive explanation to this phenomenon in Section [3] We prove Theorem [l] 
in Appendix |E| 

Theorem 1. The optimal fractional point lies on the bottom sorted path. The 
optimal fractional solution splits at most one service between two bins. 


6 Simulation results 


In this section we present our simulation results for the two bin and k bin cases. 
We run our simulation on bins with equal capacity. We compare the sorting 
algorithms for SP-MED, SP-MWOP and SP-MOP to an algorithm we call BM 
(Balanced Mean). The algorithm goes through the list, item by item, and allo¬ 
cates each item to the bin which is less occupied, which is a natural benchmark 
and also much better than other naive solutions like first-fit and first-fit decreas¬ 
ing.^ 

We show simulation results on synthetic data. We first generate our stochastic 
input {(/IW,ctW)} for n = 100 and for n = 500. Our sample space is a 
mixture of three populations: all items have the same mean (we fixed it at 
^0) = 500) but 50% had standard deviation picked uniformly from [0,0.1 • /A*)], 
25% had standard deviation picked uniformly from [0.1 • 0.5 • juW] and 25% 

had standard deviation picked uniformly from [0.5 • Ji^K 0.9 • jiW]. 

(i) 

We then randomly generated 500 sample values x\ for each 1 < i < n and 
1 < l < 500 using the normal distribution W*)] an( f from this we inferred 

parameters pS l \ V^ l \ best explaining the sample as a normal distribution. Both 


3 At first, we also wanted to compare our algorithm with variants of the algorithms 
considered in 13141 for the SBP problem. In both papers, the authors consider the 
algorithms First Fit and First Fit Decreasing [8] with item size equal to the effective 
size, which is the mean value of the item plus an extra value that guarantees an 
overflow probability be at most some given value p. Their algorithm chooses an 
existing bin when possible, and otherwise opens a new bin. However, when the 
number of bins is fixed in advance, taking effective size rather than size does not 
change much. For a new item (regardless of its size or effective size) we keep choosing 
the bin that is less occupied, but this time we measure occupancy with respect to 
effective size rather than size. Thus, if elements come in a random order, the net 
outcome of this is that the two bins are almost balanced and a new item is placed 
in each bin with almost equal probability. 
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Fig. 2. Average cost of the sorting algorithm and the BM algorithm for SP-MED, 
SP-MWOP and SP-MOP with two bins. The x axis measures 
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Fig. 3. Average cost of the BM algorithm divided by average cost of the sorting algo¬ 
rithm for SP-MED, SP-MWOP and SP-MOP with 2 bins. The x axis measures -. 


the sorting algorithm and the BM algorithm got as input {(/ljW, V^)}™, as 
well as Ci = ... = Cfc = % and output their partition. 

(i) 

To check the suggested partitions, we viewed each sample x\ as represent¬ 
ing an item instantiation in a different time slot. We then computed the cost 
function. For example, for SP-MED, the deviation value for bin j at time slot l 


is: max < 0,100 






TO 


, i.e., the deviation is measured as a percent of the 


total mean value fi. We generated 20 such lists and calculated the average cost 
for these 20 input lists for each algorithm. We run the experiment for different 
values of c. 

Figure [2] shows the 2 bins average cost of both algorithms for SP-MED, SP- 
MWOP and SP-MOP as a function of As expected, the average cost decreases 
as the value ^ increases, i.e. as the total spare capacity increases. We also see 
that the sorting algorithm out-performs BM. Figure [3] shows the average cost of 
the BM algorithm divided by the average cost of the sorting algorithm for the 
three problems, again as a function of —. 

Figure [4] shows the 4 bins average cost and cost ratio of both algorithms for 
SP-MED as a function of - for n = 100 and n = 500. As we saw in the two bins 

A 4 

case, the average cost decreases as the value ^ increases. We can also see that the 
average deviation for n = 100 is higher than the deviation for n = 500. However, 
the ratio between the BM algorithm average cost and the sorting algorithms 
average cost for n = 100 is lower than the ratio we get for n = 500. Still, in both 
n values, the sorting algorithm out-performs the BM. 






























Fig. 4. The left figure depicts the average cost of the sorting algorithm and the BM 
algorithm for SP-MED with four bins, both for n = 100 and for n = 500. The right 
figure depicts the average cost of the BM algorithm divided by the average cost of the 
sorting algorithm for n = 100 and for n = 500. The x axis measures . 
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A The general case 

We now analyze the general k bin case using the results we have obtained for 
the two-bin case. We first show the optimal factional solution is obtained by 
first sorting the bins by their capacities and the services by their VMR ratio, 
and then partitioning the sorted list to k consecutive parts. 

Assume the bins are sorted by their capacity, ci < C 2 < ... < Ck . A fractional 
solution is called sorted if for every j < j' and every two services i and i' such 
that service i (resp. *') is allocated to bin j (resp. j 1 ) it holds that the VMR of 
service i is at most that of service i!. 











Theorem 2. The optimal fractional solution is sorted. 


Proof. Assume there is an optimal solution S'-;..... 5^ that is not sorted. I.e., 
there exist j < j' and i,i' such that service i (resp. i') is allocated to bin j 
(resp. j') and the VMR of service i is strictly larger than that of service i' . By 
Theorem [l] the sorted fractional solution for the problem defined by bins j and 
j' is strictly better than the one offered by the solution Si, ..., Sk- This means 
the solution is not optimal for bins j and f - in contradiction to item 0 in 
Section 14.11 


The fact that with two bins and different capacities c\ < C 2 , low risk services 
should be allocated to the smaller capacity bin (See Theorem[l]and the discussion 
before it) implies also that for k bins lower risk services should be allocated to 
lower capacity bins. 

Next we present an algorithmic framework for the problem: 


The double sorting algorithm framework 

— Sort the bins by their capacity C\ < C 2 < .. • < c k . 

— Sort the services by their VMR < • • • < ■ 

— Use a partitioning algorithm to find k — 1 partition points to = 0 < £1 < 
■ ■ ■ £k- 1 < £k = n and output Si, ..., Sk where Sj = {£j~i + 1,..., £j}. 

— Allocate the services in Sj to bin number j (with capacity Cj ). 


The algorithmic framework is not complete since it does not specify how to 
find the partition points on the sorted path. With two bins there is only one 
partition point, and we can check all possible n — 1 integral partition points. 
With k bins there are ( k Zi) possible partition points, and checking all partition 
points may be infeasible. 

We now describe a dynamic programming algorithm that solves the problem, 
and works under very general conditions. 




Finding an optimal integral partition 


For each 1 < t < logfc, 1 < i < i' < n and j = 1 ,k keep the two values 
Partition(2 t ,i,i',j ) and D(2 t ,i,i\ j) that are defined as follows: 


— Base case t = 1: Partition(2,i,i',j) is the best integral partition point 
for the two bin problem with inputs X, ,..., Xj/ and capacities Cj,Cj + 
D(2,i,i',j) is its corresponding expected deviation. 


— Induction step. Suppose we have built the tables for t, we show how to build 
the tables for t + 1. We let Dev(2 t+1 ,i,i',j) be 


min 

i<i" <i' 


l,j) + D(2 t ,i",i',j + 2 t )} 


and we let P artition(2 t+1 ,i,i ’, j) be the partition point that obtains the 
minimum in the above equation. 


The optimal partition to k bins (assuming k is a power of two) can be recovered 
from the tables. For example for 4 bins Partition (4,1 , n, 1) returns the middle 
point t 2 of the partition, and the partition points within each half are obtained 
by £\ = Partition (2,1, l 2 ~ 1, 1) and £3 = Partitionist, £ 2 , n, 3). 


It can be seen (by a simple induction) that Partition(2 t ,i,i! , j) gives the 
middle point of the optimal integral solution on the sorted path to the problem 
of dividing Xi ,..., X ,/ to 2 t bins with capacities c-j, Cj+i,..., Cj + 2 t_ 1 . It can also 
be verified that the running time of the algorithm is 0(n 3 k\ogk). 

We analyze the error the integral solution we find has compared with the 
optimal fractional solution. The error analysis builds upon the error analysis of 
the two bin case. Start with an optimal fractional solution n. By Theorem [2] we 
know 7T is sorted, n defines some sorted fractional solution to the problem defined 
by the first two bins. By the remark after Theorem [3] we know that one of the 
integral points to the left or to the right of the fractional solution has a very close 
deviation to that of n, and we fix the first partition point accordingly to make it 
integral. Now assume after fixing j stages we hold some fractional solution 
that is integral on the first j partition points and possibly fractional on the rest. 
We look at bins j + 1 and j + 2 and find an integral partition point (to the left or 
to the right of the next fractional partition point) that is almost as good as the 
fractional one. Doing the above for j = 1,..., k — 1 we end up with an integral 
solution whose deviation is close to that of the optimal fractional solution, where 
the error bound in Theorem [3] is multiplied by a factor of k — 1. We remark that 
this also shows an efficient way of converting an optimal fractional solution to 
an integral solution close to it. 

The algorithm is quite general and works whenever we can solve the k = 2 
case, and therefore may be seen as a reduction from arbitrary k to the k = 




2 case. The main point we want to stress is that even in this generality the 
complexity is polynomial. The main disadvantage of the algorithm is that it 
runs in about n 3 time. For specific cases (such as SP-MED and SP-MOP and 
probably other natural variants) the optimal solution may be probably found 
much faster, and we are currently working on proving the correctness of an 
almost linear algorithm. 


B Three cost functions 


B.l 


Stochastic Packing with Minimum Expected Deviation 
(SP-MED) 


In Appendix |G.l we prove the expected deviation of bin j (j = 1 ,k), denoted 
by Dev Sj , is 

Dev S] = <Jj [<^(Z\j) - Aj( 1 - 

where <fi is the probability density function (pdf) of the standard normal distri¬ 
bution and # is its cumulative distribution function (CDF) Q Denoting g(A) = 
4 >{A) — A{ 1 — &{A)) we see that Devsj = <Jj g(Aj). With two bins Dev is a 
function from [ 0 , l] 2 to R and Dev(a , b) = <j\g{A{) + 025(^2) where the first bin 
has mean ap and variance bV, the second bin has mean (1 — a)p and variance 
(1 — b)V and a ? , Aj are defined as above. 


Lemma 3. Dev respects conditions 0-0 of Section |^T 


Proof. We go over the conditions one by one. 


item 0: Let Si ,..., Sk be an optimal solution. Suppose there are j and j' for 
which Sj,Sj> are not the optimal solution for the two-bin problem defined 
by the services in Sj U Sj> and capacities Cj ,cy. Change the allocation of the 
solution Si, ..., Sk on bins j and f to an optimal solution for the two bin 
problem. This change improves the expected total deviation of the two bins 
while not affecting the expected deviation of any other bin. In total we get 
a better solution, in contradiction to the optimality of Si, ..., Sk- 
item In Appendix I g 31 we prove Dev(a, b) = Dev{ 1 — a — C2 ^ Cl , 1 — b). We 
remark that for ci = Ci this simply says we can switch the names of the first 
and second bin. _ 

we calculate = p 2 + ^ii] > 0. It 

< 1, Dev (a) is convex and has a unique minimum. 


G.3 

~<o 


item 0: In Appendix 
follows that for any C 
The unique point (m(6), b) on the valley is the one where Ai = A 2 . 
item 0: We first explicitly determine what Dev restricted to the valley is as 
a function D(b) = Dev(m(b),b) of b. As Dev(a , b) = aig{Ai) + 02 g(A 2 ) and 
on the valley A 1 = A 2 we see that on the valley Dev(a , b) = (ay +<T 2 )g(Ai). 

C2 —(1 —a)M 

•^2 


However, a 1 + 02 also simplifies to 


= Altogether, 


4 A quick background on the normal distribution is given in Appendix |f| 















we conclude that on the valley Dev(a, b) = (c — p )is a function of z\i 
alone. 

Now it is a straight forward calculation that dD g 1 ^^ = —(c — p) < 0. 

Also AAl is negative when b < \ and positive when b > k (see Appendix 


and decreasing for b > £ as claimed. The saddle point (a = ^ — C2 2 ~ Cl , b = |) 
lies on the valley and is a maximum point for Dev restricted to the valley. 
We remark that we could simplify the proof by using Lagrange multipliers 
(as we do in Lemma [8]). However, since here it is easy to explicitly find Dev 
restricted to the valley we prefer the explicit solution. Later, we will not 
be able to explicitly find the restriction to the valley and we use instead 
Lagrange multipliers that solves the problem with an implicit description of 
the valley. 

Lemma 4. The difference between the expected deviation in the integral point 
found by the sorting algorithm and the optimal integral (or fractional) point for 
SP-MED is at most ^ ^ • p. In particular, when L = o(n) and a is a 

constant, the error is o(/i). 

Proof. We know from Theorem [3] that the difference is at most 

min{|VD(£ 1 )|,|VD(£ 2 )|}^, 

where G [0\,0PTf], f 2 G [0PTf,0 2 ] and 0\ and 0 2 are the two points on 
the bottom sorted path between which OPTf lies. Using the partial derivatives 
calculated in Appendix |G.3| we see that 


G.3 


and 


G.3) 


As ^ we see that D(b) is increasing for b < \ 


|V(a 2ff )(Z\ 2 )| < \p (1 — <1>(A 2 ))\ 


1 2Vl ~b 


4>(A 2 )| < ix- 


2y/l - b 


4>{A 2 ). 


Moreover, 2 ^—^ 4>{A 2 ) = A 2 (f>(A 2 ) and a simple calculation shows 

that the function A<j>(A) maximizes at A = 1 with value at most ^d_ . By our 


assumption that Aj > 0 for every j , we get that 


cr A \ < 7 cry/I — b 1 ^ V 1 

2Vl - b 2 ^ 2y/\ - b c 2 - (1 - a)p sj2ite ^ 2\f2Tre c 2 - (1 - a)p 

Applying the same argument on 0 2 shows the error can also be bounded by 
v 1 

2 v / 27re ci-a/j.' 

However, (ci — ap) + (c 2 — (1 — a)p) — c— p which is the total spare capacity, 
and at least one of the bins takes spare capacity that is at least half of that, 
namely ' Tf . Since the error is bounded by either term, we can choose the one 
where the spare capacity is at least and we therefore see that the error is at 
most „ Yr — -A—. Since we assume c — p> ap for some constant a > 0, the error 

zv 2ne c—fi 

is at most A- As we assume V < p 2 , d- < ^ which completes the proof. 































This shows the approximation factor goes to 1 and linearly (in the number of 
services) fast. Thus, from a practical point of view the theorem is very satisfying. 


B.2 Stochastic Packing with Min Worst Overflow Probability 
(SP-MWOP) 

The SP-MWOP problem gets as input integers k and n, specifying the number 
of bins and services, integers ci,...,Ck, specifying the bin capacities and val¬ 
ues {(/jW, VM)} . specifying that the demand distribution of service i is 
normal with mean /xW and variance pW. A solution to the problem is a parti¬ 
tion of [n] to k disjoint sets Si,..., Sk C [n] that minimizes the worst overflow 
probability. 

The SP-MWOP problem is a natural variant of SBP. For a given partition 
let OFPj (for j = 1 ,... ,k) denote the overflow probability of bin j. Let WOFP 
denote the worst overflow probability, i.e., WOFP = maxj =1 {OFPj}. In the 
SBP problem we are given n normal distributions and wish to pack them into 
few bins such that the OFP < k for some given parameter p. Suppose we 
solve the SBP problem for a given p and know that k bins suffice. We now ask 
ourselves what is the minimal WOFP achieved with the k bins (this probability 
is clearly at most p but can also be significantly smaller). We also ask what is 
the partition that achieves this minimal worst overflow probability. The problem 
SP-MOP does exactly that. 

In Appendix |H.l we prove the overflow probability of bin j {j = 1,..., k), is 


OFPj = 1 — &(Aj) where Aj = 


c j-AQ- 


Thus, 


WOFP = max{1 — <P(Aj)} . 


With two bins WOFP is a function from [0, l] 2 to ffi. and WOFP{a,b) = 
maxjl — <P(Ai), 1 — <P{A ‘2 )} where the first bin has mean ap and variance bV, 
the second bin has mean (1 — a)p and variance (1 — b)V and ay, Aj are defined 
as above. 


Lemma 5. 


WOFP respects conditions 0-0 of Section \ f.l 


Proof. We go over the conditions one by one. 

item 0: Let Si ,..., Sk be an optimal solution. Suppose there are j and f for 
which Sj , Sj' are not the optimal solution for the two-bin problem defined 
by the services in Sj U Sj> and capacities Cj,Cj'. Change the allocation of 
the solution Si,... ,Sk on bins j and f to an optimal solution for the two 
bin problem. This change improves the worst overflow probability of the two 
bins while not affecting the overflow probability of any other bin. In total 
we get a better solution, in contradiction to the optimality of Si ,..., Sk- 
item <®>: The same proof as in Appendix |G.2| shows WOFP(a, b ) = WOFP{l— 
a-^A-b). 










item ©: Fix b. Denote OFP\{a, b ) = OFPs 1 (a/z, bV) = 1 — <P(Ai). It is a sim¬ 
ple calculation that f°g a P % [a, fe) = ^ 7 =^ • <j>{Ai) > 0. Similarly, if OFP 2 (a,b) 
denotes the overflow probability in the second bin when the first bin has to¬ 
tal mean a/z and total variance bV, then dOPP ' 2 = • 4>{A 2 ) < 0. Thus, 

OF Pi is monotonically increasing in a and OFP 2 is monotonically decreas¬ 
ing in a, and therefore there is a unique minimum for OFP(a,b ) (when b 
is fixed and a is free) that is obtained when OFP\{a,b) = OFP 2 (a,b ), i.e., 
when Ai = A 2 . 

item We first explicitly determine what WOFP restricted to the valley is 
as a function D{b) = WOFP(m(b),b) of b. From before we know that on the 
valley A\ = A. Therefore, following the same reasoning as in the SP-MED 
case, 


a Vb + \/l — b 

It follows that D{b ) is monotonically decreasing in b for b < | and increasing 
otherwise. The maximal point is obtained in the saddle point that is the 
center of the symmetry. 

Lemma 6. The difference between minimal worst overflow probability in the 
integral point found by the sorting algorithm and the optimal integral (or frac¬ 
tional) point for SP-MWOP is at most In particular, when L = o(n) 

and a is a constant, the difference is o(l). 

Proof. We know from Theorem [3] that the difference is at most 

min{|VD(£i)|,|VD(6)|}^ 

where £1 = (ai,&i) £ [Oi,OPT/], f 2 = ( a 2 ,b 2 ) £ [0PTf,0 2 ] and Oi and 0 2 are 
the two points on the bottom sorted path between which OPTf lies, and notice 
that even though WOFP is not differentiable when Ay = A 2 , it is differentiable 
everywhere else. We now use the partial derivatives calculated in Appendix |G.3[ 
We also replace with and similarly for the other term. We get: 


min 


\A 2 cf(A 2 )\-\( - 

ci - (1 


1 

o-i)li ’ 2(1 - 61 ) 


\Ai<KAi)\ 



L 

n 


A<j>(A) maximizes at A = 1 with value at most . Also, (ci — a 2 p) + 

(c 2 — (1 — ai)/z) = c — /.z — (a 2 — ai)/z > c — pff > ffp, where a is the total 
space capacity, and a constant by our assumption. Hence, at least one of the 
terms is at most A Also, for that term, the spare capacity 

is maximal, and therefore it takes at least half of the variance. Altogether, the 
difference is at most O(f^) which completes the proof. 

















B.3 Stochastic Packing with Minimum Overflow Probability 
(SP-MOP) 


The SP-MOP problem gets as input integers k and n , specifying the number 
of bins and services, integers ci,..., c*,, specifying the bin capacities and values 
pW)} . =1 , specifying that the demand distribution of service i is 
normal with mean and variance A solution to the problem is a partition 
of [n] to k disjoint sets Si,... ,Sk C [n] that minimizes the overflow probability. 

The total overflow probability is OFP = 1 — 1^=1 (1 ~ OFPj) where as we 
computed before (in Appendix |H.1[ ) OFPj = 1 — <P(Aj). With two bins OFP is 
a function from [0, l] 2 to R and OFP(a, b) = 1 — <P(Ai)<I>(A 2 ) where the first bin 
has mean a/i and variance bV , the second bin has mean (1 — a)/i and variance 
(1 — b)V and (Jj,Aj are defined as above. 


Lemma 7 . OFP respects conditions 0-0 of Section 




Proof. We go over the conditions one by one. 

item 0: Let Si,... ,Sk be an optimal solution. Suppose there are j and f for 
which Sj,Sj> are not the optimal solution for the two-bin problem defined 
by the services in Sj U Sj> and capacities Cj ,Cj>. Change the allocation of the 
solution Si,..., Sk on bins j and j' to an optimal solution for the two bin 
problem. This change improves (1 — OFPj) (l — OFPj’) while not affecting 
the overflow probability of any other bin. In total we get a better solution, 
in contradiction to the optimality of Si, ..., Sk- 
item The same proof as in Appendix |G.2| shows OFP(a,b) = OFP( 1 — 
a -a=a,l-6). 
item 0 : Fix b. 

cfiOFP A /\ 9 

— = A^{AimA. 2 ) + -4<j>(A 2 )c[>(Ai) + - <t>(Ai)<f>(A 2 )}. 

o z a ai (Jo &i&2 


In particular d g 2 ^ P > 0 and for every fixed b, OFP(a, b ) is convex over 
a € [0..1] and has a unique minimum a = m(b). 


Proving there exists a unique maximum over the valley is more challenging. 
We wish to find all extremum points of the cost function D ( OFP in our case) 
over the valley {(m(6), &)}. Define m(a, b) = a — m(b). Then we wish to maximize 
D(a,b) subject to m(a,b ) = 0. Before, we computed the restriction D(b) of the 
cost function over the valley and found its extremum points. However, here we 
do not know how to explicitly find D(b). Instead, we use Lagrange multipliers 
that allow working with the implicit form m(a, b) = 0 without explicitly finding 
D(b). We prove a general result: 


Lemma 8. If a cost function D is differentiable twice over [0,1] x [0,1] and 
respects conditions 0-0 of Section \f.l\ and if for every b £ [0,1], D(a,b) is 
strictly convex for a £ [0,1], then any extremum point of D over the valley must 
have zero gradient at Q, i.e., \7(D)(Q) = 0. 











Proof. The valley is defined by the equation m(a, b) = §^(a, b). Using Lagrange 
multipliers we find that at any extremum point Q of D over the valley, 

V(D)(Q) = AVm(Q). 


For some real value A. However, 


V(D)(Q) 




( 0 , 


<9L> 


(Q)), 


because Q is on the valley. H 

Also, as for every fixed b , D(a,b) is strictly convex, we have that 


dm 

da 


(Q) 


d 2 D 

IPa 


(■ Q ) 


> o. 


Therefore, we conclude that A = 0. This implies that ^{Q) = 0. Hence, 

V(D)(Q) = 0. 


With that we prove: 


Proof. Let Q = (a, b) be an extremum point of OFP over the valley. We look at 
the range b £ [0..^), b > \ is obtained by the symmetry. Then, by Lemma [8] 


Lemma 9. OFP respects condition (01) of Section f.l 


dA BA 

4>(Ai)$(A 2 )-7^- = , and 

Dividing the two equations we get 


dA 1 dA 2 dA 2 dA 1 
da db da db 

Plugging the partial derivatives of Ai by a and b , we get the equation 

ZU = j~T 
A 2 V 1 — 6 

As6<^,6<1 — b and we conclude that at Q Ai < A 2 . However, using the 
log-concavity of the normal c.d.f function <P we prove in Appendix [I] that: 

= 0 at a point Q = (a, b) with b < | implies A\ > A 2 . 


Lemma 10. 













Together, this implies that the only extremum point of OFP over the valley 
is at b = However, at b = 0, the best is to fill the largest bin to full capacity 
with variance 0, and thus, OFP(m( 0), 0) = 1 — <£(Z\) where A = '(ff. On the 

other hand, at b = i, OFP(a = = 1 — <L>{ Cl ~A^ )<T{ C2 ^' yl r f a ' >iJ ' ). As 

V s' 7 V 2 a 

(ci — a/i) + (C 2 — (1 — a)ff) = c — n, either C\ — ap or C 2 — (1 — a)p is at most 
and therefore 2^^) < = <P(^£) < #(£=**). 

We conclude that OFP(a , tj) > OFP(m( 0),0) and there is a unique maximum 
point on the valley and it is obtained at 6 = |. 

C Bounding the approximation error of the sorting 
algorithm 

We assume that no input service is too dominant. Recall that we represent service 
i with the point P« = (aW, 6^) and P W + P^ + ... + P^ = (1,1). Thus, 
|P^| > |(1,1)| = \[2 (by the triangle inequality) and JT |-P^| < 2 (because 
the length of the longest increasing path from (0, 0) to (1,1), is obtained by the 
path going from (0,0) to (1,0) and then to (1,1)). Hence, the average length 
of an input point P^ is somewhere between and ((. Our assumption states 
that no element takes more than L times its ’’fair” share, i.e., that for every i, 
|PW| < With that we prove: 

Theorem 3. Let OPTf be the fractional optimal solution. If D is differentiable, 
the difference between the cost on the integral point found by the sorting al¬ 
gorithm and the cost on the optimal integral (or fractional) point is at most 
min{|VP(a)|,|VD(6)|}^ where & G [(ff,OPT f \, & G \OPT f ,0 2 \ and 0 1 
and O 2 are the two points on the bottom sorted path between which OPTf lies. 

Proof. Suppose we run the sorting algorithm on some input. Let OPTi nt be the 
integral optimal solution, OPTf the fractional optimal solution and OPT sort 
the integral point the sorting algorithm finds on the bottom sorted path. We 
wish to bound D(OPT sort ) — D(OPT int ) and clearly it is at most D(OPT sort ) — 
D(OPTf). We now look at the two points 0\ and 0 2 on the bottom sorted path 
between which OPTf lies (and notice that as far as we know it is possible that 
OPT sort is none of these points). Since D(OPTf) < D(OPT sort ) < -D(Oi) and 
D(OPTf) < D(OPT sort ) < D{0 2 ) the error the sorting algorithm makes is at 
most 

min {D(Oi) - D(OPT f ), D(0 2 ) - D{OPT f )} . 

We now apply the mean value theorem and use our assumption that for every i, 

We define a new system constant, relative spare capacity , denoted by a which 
equals , i.e., it expresses the spare capacity as a fraction of the total mean. 
We assume that the system has some constant (possibly small) relative spare ca¬ 
pacity. Also, we only consider solutions where each bin is allocated services with 



total mean not exceeding its capacity. Equivalently, we only consider solutions 
where Aj > 0 for every 1 < j < k. We will later see that under these conditions 
the sorting algorithm solves all three cost functions with a small error going fast 
to zero with n. 

We remark that in fact the proof shows something stronger: the deviation of 
any (not necessarily optimal) fractional solution on the bottom sorted path, is 
close to the deviation of the integral solution to the left or to the right of it on 
the bottom sorted path. 

D Proof of Lemma [T] 

Proof. We introduce some notation. Let r = ti, ..., t„ be a sequence of n ele¬ 
ments that is a reordering of {1,..., n}. We associate with r the n partial sums 
P^, ■ ■ • ,-Pr"^ where Pft is P^ Ti \ he., Pfi is the integral point that is the 
sum of the first i points according to the sequence r. We also define P^ = (0,0) 
and P^ = (1,1). The curve connecting r is the curve that is formed by con¬ 
necting pj^ and Pt +1] with a line, for i = 0,...,n — 1. 

Assume that in the sequence r = T \,..., r„ there is some index i such that 
the VMR of p( r d is larger than the VMR of P( Ti+1 ). Consider the sequence 
t' that is the same as r except for switching the order of Tj and r,_|_i. I.e., 
t' = ti, , T,—i , Tj_|_i, Ti , Ti + 2 ,..., T n . We claim that the curve connecting r' lies 
beneath the curve connecting r. To see that notice that both curves are the same 
up to the point pJ- ! _1 1. There, the two paths split, r adds p( T d and then p( Ti + 1 ) 
while t' first adds p( Ti + 1 ) and then p( Ti \ Then the two curves coincide and 
overlap all the way to (1,1). In the section where the two paths differ, the two 
different paths form a parallelogram with p( T d and pP*+ 1 > as two neighboring 
edges of the parallelogram. As the angle p( Ti +d has with the a axis is smaller 
than the angle P^ Ti > has with the a axis, the curve connecting t' goes beneath 
that of t. 

To finish the argument, let Pj be an arbitrary integral point for some / C \n]. 
Look at the sequence r that starts with the elements of I followed by the elements 
of [n]\/ in an arbitrary order. Notice that P/ lies on the curve connecting r. Now 
run a bubble sort on r, each time ordering a pair of elements by their VMR. 
Notice that the process terminates with the sequence that sorts the elements 
by their VMR and the curve connecting the final sequence is the bottom sorted 
path. Thus, we see that the bottom sorted path lies beneath the curve connecting 
r, and in particular Pj lies above the bottom sorted path. A similar argument 
shows Pi lies underneath the upper sorted path. 

E Proof of Theorem [lj The optimal solution is on the 
bottom sorted path 

Proof. Consider an arbitrary fractional point (ao,&o) lying strictly inside the 
polygon confined by the upper and bottom sorted paths. If bo < \, then by 


keeping b = bo constant and changing a till it reaches the valley we strictly 
decrease cost (because D is strictly monotone in this range). Now, when changing 
a we either hit the bottom sorted path or the valley. If we hit the bottom sorted 
path, we found a point on the bottom sorted path with less cost and we are 
done. If we hit the valley, we can go down the valley until we hit the bottom 
sorted path and again we are done (as D is strictly monotone on the valley). 

If bo > 2 we recall two facts that we already know: 

— The point (1 — ao, 1 — bo) = p(a o, bo) is fractional (since (ao, bo) is fractional 
and ip maps fractional points to fractional points), and, 

— By the reflection symmetry we know that D(ao , bo) — D{ 1 — ao — (, 1 — bo) 
where C = p~ Cl > 0. 

A 4 

Now, (1 — ao — 0 1 — bo) has b coordinate that is at most Also (1 — ao — C, 1 —&o) 
lies to the left of the fractional point (1 — a 0 ,1 — &o) (since ( > 0) and therefore 
it lies above the bottom sorted path. We therefore see that the point (ao, bo) has 
a corresponding fractional point with the same cost and with b coordinate at 
most |. Applying the argument that appears in the first paragraph of the proof 
we conclude that there exists some point on the bottom sorted path with less 
cost, and conclude the proof. 

F Standard Normal Distribution 

The probability density function of the standard normal distribution (that has 

i _x£ 

mean 0 and variance 1) is 4>{x) = and the cumulative distribution func¬ 
tion is $(x) = ^== e~^ dt. Clearly, d>'(x) = (f{x). Also, <j/(x) = ~ = 

—xcj){x). The second derivative is (p"(x) = —<t>{x) +x‘ 2 (j){x) = ( x 2 — 1 )4>{x). 

G SP-MED 

G.l Expected Deviation of a single bin 

By definition the expected deviation of a single bin is Devs, = —f°° (x — 

J <TjV j c j v 

Cj)e 2rT j dx. Doing the variable change t = x J 13 and then the variable change 
U = Af- we get: 


Dev Sj 




Cj)[ l-^( 




-ajAj[l - ${Aj)] + 




= - Aj(l - <P{Aj))\. 


Denoting g(A) = </>(A) — A[ 1 — d>{A)\ we see that Devsj = ajg(Aj). 



G.2 Symmetry 


Claim. Dev(a, b) = Dev(l — a — 02 Cl , 1 — 6). 

Proof. Let us define <J\{b) = Vb a, a 2 (b ) = y/l — b a, Ai(a,b) = and 

A 2 (a,b) = C2 ~^, 1 (7)^ M • We know that Dev(a } b ) = ai(b)g{A 1 (a,b))+a 2 (b)g(A 2 (a,b)). 
To prove the claim it is enough to show that the following four equations hold: 
cti(6) = cr 2 (l - b), a 2 (b) = <j \(1 - b), Z\i(a, b) = Z\ 2 (l — a + - b) and 

A 2 (a,b) = A 1 (l-a+^,l-b). 

Indeed, cti(1 — b) = sjl — b a = a 2 (b) and similarly <r 2 (l — b) = eri(6). Also, 


A 2 ( 1 - a - — -—, 1-6) 

fl 


C 2 -(l-(l -g+^))g 
cr 2 (l - 6) 


A similar check shows that A\(l — a — C2 ^ Cl , 1 — 6) = Z\ 2 (a, 6). 


G.3 The partial derivatives of Den 

The system has absolute constants g and V, cr = \/V. We let cri = Vba and 
cr 2 = \jY^ba. We let A x = and Z\ 2 = = C2 ~ (1 ~ a)/X . In 

this notation, Dev(a , 6) = cri g(Z\i) + cr 2 g{A 2 ). 

We calculate the g derivatives. Since g{A) = (j>{A) + A<P(A) — A we have that 
g'{A) = <j/(A)+$(A)+A<l>(A)-l = -A<I>(A)+$(A)+A<I>(A)-1 = -(1 -<2>(Z\)) 
and g"(A) = 4>(A). As <£( A ) < 1, g'(A) < 0 and g is monotonically decreasing. 
As g”{A) > 0, g is convex. 


Deriving by a We have: ^ and ^ Also, 


dDev 

da 


(a, 6) = cr i 


9ff(^i) 9/\i , _ dg{A 2 ) 8A 2 
dAi da 


cr 2 - 


9A 2 


9a 


= -<T 1 — [$(A{) - 1] +ct 2 — [<Z>(A 2 ) - 1] = g [<P{A 2 ) -#(2\i)] 
(Ji cr 2 


Since is monotonic, a zero value is achieved when A| = Z\ 2 . Deriving again 
by a: 


d 2 Dev _ d@(A 2 ) dA 2 d<P(Ai) dAi _ 2 MA 2 ) 4>(Ai) 

da 2 “ [ dA 2 ~da dA x ~dV J M [ <r 2 + <n J 

Since 9 ^ 0, it follows that for any 0 < 6 < 1, Dev(a) is convex and 

hence A\ = A 2 is a minimum point in this range. 



Deriving by b We have: ^ ^ and = -- 


= . Also, = -4r and 


oa 2 

db 


2 ( 1 - 6 ) * 


Now, 


dDev 

db 


do i 

“96" 


g(Z\i) + CTl 


9g(^i) 9A, 

9Z\i db 


do 2 

~db 


g{A 2 ) + ct 2 


dg{A 2 ) dA 2 
dA 2 db 


p MA-l) <!>{A 2 ) = 0 (^ 2 ) 

2 \fb y/1 — b 2 o\ o 2 


Deriving a second time we get: 


d 2 De v 
db 2 


(t J_ 9</>(Z\i) 9Ai _ 1 _ 1 d<j>(A 2 ) 9 /\ 2 

2 v 6 9Ai 96 2bVb‘ P[ 1 ^/^^6 9A 2 96 


4 bVb 


(A 2 


1 ) + 


4>{A 2 ) 

(l-6)v / l Tr 6 




1 )] 


1 

2(l-6)v / r r 


The Mixed Derivative 

d 2 Dev 9 9-Dei: 9 d<P(A 2 ) dA 2 

w = 55 = sd 1 =" [ af - 

=" W4,) 2(rhj + +yr 1 ^ 


9<?(Z\i) 9Ai 
9 A, db 

A 2 <f>(A 2 )\ 


The saddle point The Hessian of the function Dev contains the second order 
partial derivatives of Dev, i.e., it is a 2 x 2 matrix H , 


H(a, 6 ) 


d 2 Dev 
da 2 
d 2 Dev 
dadb 


( a,b) 

(a,b) 


d 2 Dev 
dbda 
cr Dev 
db 2 


{a, 6 ) \ 
(a, 6) y 


H is symmetric and therefore at any point (a, 6 ) the matrix H(a, 6 ) has two real 
eigenvalues. Now, assume the first order derivatives of Dev vanish at some point 
(ao, 60 ). If the Hessian at that point (ao,bo) has negative determinant, then the 
product of these two eigenvalues is negative, and hence one is positive and the 
other negative. It then follows that the point (do, bo) is a saddle point of Dev. 

We now compute the Hessian at the point (a 0 , 6 0 ) = (| — C2 2 ~^ Cl , I) which is 
the center of the reflection symmetry. We know that at this point A\ = A 2 and 
Gy = cr 2 . If then follows that: 


H(a 0 , bo) 


( 2v/2 p 2 2 pA\<j>{A\) \ 

v 2 pAi<j>(Ai) \f2 0 <j>{A-f) [A\ - 1] y 


f (t>{ A 2 ) ] 
0 


We can now compute the determinant: 



i rrr/ i m d 2 Dev. d 2 Dev, d 2 Dev, , l2 

det[ff(a 0 ,&o)] = g q2 (oo,&o) g &2 (a 0 ,fr 0 ) - [ (a 0 , b 0 )} 

= 2V2 p 2 V2 a </>(Z\i) [Zi? - 1] - 4/i 2 Z\ 2 ^(Z^)] 2 

G 

= -Vi^zh)] 2 < 0. 

As det[if(ao, &o)] < 0 and the partial derivatives of the first order vanish at 
(do, 60) we conclude that (ao,&o) is a saddle point. 

H SP-MWOP 

H.l Expected overflow probability of a single bin 




The overflow probability of bin j, denoted by OFPg t is OFPs j (p v . Vj ) = ^ J c °° e 2a _ 

Substituting t = we get 


OFP Sj {pj,Vj) = 


)=—r 

\/2n J p-p 


e ^dt = l -^(———) = l-^(ZL). 

~>‘i CTj 


I SP-MOP 

I.l Proof of Lemma 1101 

Proof. The condition 0O d ^ l 1 ’ = 0 is equivalent to 

_ cq 0(^2) 

<?(Z\i) (T2 ^(Zi 2 ) 

As 6 < |, & < 1 — b and oq < cr 2 . Hence, 

0(^2) 

*(^i) <?(Z\ 2 )' 

Denote /i(Zl) = |^y. We will prove that h is monotone decreasing, and this 
implies that Z\i > Z\ 2 . 

To see that h is monotone decreasing define H(A) = ln(^(Z\)). Then h = H'. 
Therefore, hi = H". However, <d> is log-concave, hence H" < 0 . We conclude that 
h! < 0 and h is monotone decreasing. 




















J Unbalancing bin capacities is always better 

Suppose we are given a capacity budget c and we have the freedom to choose 
capacities Ci,C2 that sum up to c for two bins. Which choice is the best? Off¬ 
hand, it is possible that for each input there is a different choice of C\ and ci 
that minimizes the expected deviation. In contrast, we show that the minimum 
expected deviation always decreases as the difference C2 — c\ increases. 

Lemma 11. Given a capacity budget c, the minimum expected deviation de¬ 
creases as C2 — Ci increases. In particular the best choice is having a single bin 
with capacity c and the worst choice is splitting the capacities evenly between the 
two bins. 

Proof. Recall that A 1 (a,b) = and A 2 (a,b) = Therefore, if we 

reduce Ci by c and increase C2 by c, we get 


Ai(a, b) 


Ci — c — afj. 

aVb 


ci -(a - g)/x 

ay/b 


A±(a - 


c 

l l 


,&)• 


Similarly, A 2 {a,b ) = A 2 (a—j i ,b). Let Dev CltC 2 (a,b) denote the expected devi¬ 
ation with bin capacities ci, C2. As Dev{a 1 b) = <Ji{b)g{Ai(a 1 b))+a 2 (b)g(A 2 (a, b)) 
we see that 


Dev Cl -c,c 2 +c(a, b) = Dev CltC2 {a - ,b), 

/i 


i.e., the graph is shifted left by U 

Notice that the bottom sorted path does not depend on the bin capacities 
and is the same in both cases. Let (a, b ) be the optimal fractional solution for bin 
capacities ci, C2. We know that (a, b) is on the bottom sorted path. Let a = a— A 
We know that Dev Cl _c,c 2 +c(fli b) = Z?et> CljC2 (a, b). The point (a, b) lies to the left 
of the bottom sorted path and therefore above it. As the optimal solution for 
bin capacities Ci — c, C2 + c is also on the bottom sorted path and is strictly 
better than any internal point, we conclude that the expected deviation for bin 
capacities ci — c, C2 + c is strictly smaller than the expected deviation for bin 
capacities ci,C2. 


An immediate corollary is the trivial fact that putting all the capacity budget 
in one bin is best. Obviously, this is not always possible nor desirable, but if there 
is tolerance in each bin capacity, we recommend minimizing the number of bins. 



